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Abstract 

Simulated wide-field images are becoming an important part of observational astronomy, either to prepare for new 
surveys or to test measurement methods. In order to efficiently explore vast parameter spaces, the computational speed 
of simulation codes is a central requirement to their implementation. We introduce the Ultra Fast Image Generator 
(UFig) which aims to bring wide-field imaging simulations to the current limits of computational capabilities. We 
achieve this goal through: (1) models of galaxies, stars and observational conditions, which, while simple, capture 
the key features necessary for realistic simulations, and (2) state-of-the-art computational and implementation opti- 
mizations. We present the performances of UFig and show that it is faster than existing public simulation codes by 
several orders of magnitude. It allows us to produce images more quickly than SExtractor needs to analyze them. 
For instance, it can simulate a typical 0.25 deg 2 Subaru SuprimeCam image (10kx8k pixels) with a 5-cr limiting mag- 
nitude of R = 26 in 30 seconds on a laptop, yielding an average simulation time for a galaxy of 30/zs. This code is 
complementary to end-to-end simulation codes and can be used as a fast, central component of observational methods 
relying on simulations. 

Keywords: Simulations, Wide-field imaging, Computational speed 
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1. Introduction 

Image simulations are becoming ubiquitous in obser- 
vational astronomy. They are intensively used in top- 
ics as diverse as extragalactic astrophysics (with pub- 
lic codes like Skymaker [4], SImage II 1 311 . Shera 02311 . 
or the simulation package developed by the Large Syn- 
optic Survey Telescope (LSST) team jlJl), CMB analy- 
sis (e.g. IU2IP . or Supernovae (e.g. Jj, |33|]). Simula- 



tions are mainly used to forecast the results of an obser- 
vation strategy and to test measurement methods. Ex- 
amples are given by the LSST simulations JH [To! [Till , 
the public Shear Testing Program (STEP) and Gravita- 
tional Lensing Accuracy Testing (GREAT) simulations 
itTHillillTl], as well as individual works (e.g. lHH). 

In this work, we focus on simulations used for ex- 
tragalactic wide-field astronomy. Depending on their 
aim, such simulations will either be minimalist, like the 
GREAT simulations which simulated simplistic individ- 
ual galaxies on a mesh, or include most of, or all rele- 
vant cosmology, astrophysics, atmosphere physics and 
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telescope characterization (LSST). The STEP project 
(using Skymaker and SImage) took an intermediate ap- 
proach, where simulations look realistic, but without fo- 
cusing much on any particular physics or observational 
apparatus. 

Depending on their emphasis, the speed of simulation 
codes can vary greatly, from hours to days to simulate a 
quarter square degree ground-based image. However, if 
a simulation package is to be used as an integrated part 
of a method and pipeline calibration process, its speed 
becomes a driving parameter: a fast simulation code, 
used to calibrate an external measurement method, al- 
lows one to efficiently explore a larger parameter space 
for the measurement method or survey. 

The aim of this paper is to introduce the Ultra Fast 
Image Generator (UFig), a fast simulation C++ code 
able to simulate realistic images on a timescale com- 
parable to that needed by SExtractor to analyze 
similar images (i.e., less than one minute for a 0.25 sq. 
deg. image); since it is widely used in astronomy and 
is well optimized, we take SExtractor as a reference. 
To this end, we adopt simple, yet realistic, models of 
galaxies, stars and observation conditions, that allow us 
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to minimize the computation load. We also bring our 
code to the current computation limits by highly opti- 
mizing our implementation through an efficient use of 
random number generators, parallelization and vector- 
ization. This fast code is thus complementary to end-to- 
end simulation packages aimed at detailed modeling of 
observational effects. 

Although the code may be used to simulate an im- 
age from various ground-based facilities and observa- 
tion conditions, in this paper we use as a practical test 
case the simulation of a typical Subaru SuprimeCam 
jilll coadded 10kx8kpixels image, processed from four 
450-seconds exposures, with a 5-cr (extended) limiting 
magnitude of Rc = 26, unless otherwise stated. 

Section [2] summarizes the model that we use for 
galaxies, stars and noise. Further details about it 
can be found in the appendices. Section [3] moti- 
vates the strategy used to optimize the simulation 
of realistic images, and Section |4] describes the im- 
plementation and the optimizations we use on the 
computational part of the problem; in particular, we 
present how we optimize random numbers generation 
and implement multithreading. Section [5] explores 
the consistency of the UFig simulations with real 
images. Section [6] shows the performances of our 
code; in particular, we show in this section how the 
execution time depends on the image's size and on 
the exposure time. We conclude in Sect. [7] Further 
details about UFig, including examples and informa- 
tion about the distribution of the code, can be found at 
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2. Model 

The simulations are based on a simple, yet realis- 
tic, modeling of galaxies, stars and noise: the models 
are summarized in this section. The appendices expand 
on the astrophysics and the methods used to assess our 
models. 



We find the probability density function (p.d.f) of Ser- 
sic indices for bright galaxies (magnitude less than 20) 
to be well represented by f(n) = exp(yV(0.3, 0.5) + 
Af(1.6,0.4)) + 0.24, where N(fi,cr 2 ) represents the 
normal distribution of mean fi and variance a 2 (see 



Appendix A I. For faint galaxies (magnitude bigger than 
20), we describe it by f(n) = exp(Af(0.2, 1)) + 0.2. 

We parametrize the magnitude distribution of galax- 
ies with a polynomial of the form log I0 (A^ < mag) = 
2,fl,mag'. Table I A. 21 summarizes the coefficients a,- for 
different filters. 

Finally, we account for the galaxies intrinsic elliptic- 
ity distribution with a 2D Gaussian (of both components 
of the ellipticity) of width cr\ — o~2 — 0.15. 

2.2. Stars and Point Spread Function 

We use a Moffat profile to account for the Point 
Spread Function (PSF). The (circular) Moffat profile is 
defined as [32]: 



I(r) = k 



1 + 



(2) 



where Iq is the value at the origin (r = 0), and a and 
/J are scale parameters depending on the observation's 
conditions. The width of the profile, a, is related to its 
FWHM and to its half-light radius r 50 . 

2.3. Noise 

We finally account for noise in the image. We first 
add Poisson noise for galaxies and stars. We then add 
the noise from various sources, like the sky brightness, 
the readout noise and the errors arising during the data 
processin g, s uch as flat-field inaccuracies. Following 

e.g. ea we define it as a Gaussian random de- 
viate with zero mean and standard deviation given by 
Eq. (1C.U . We finally correlate the noise with a Lanczos 
resampling [15]. 



2.1. Galaxies 

We assume that galaxy profiles are well described by 
the Sersic profile [38] 



I(r) = 7(r 50 ) exp 



-k 



l/n 

— I -1 

f50 



(1) 



where 7*50 is the radius of the circle enclosing 50% of 
the total flux of the galaxy, n is the Sersic index, and k 
is a constant satisfying the equation 2y(2n, k) = T(2n), 
where y(x, k) is the lower incomplete gamma-function, 
and F(x) is the gamma-function. 



3. Code requirements 

3.1. Requirements 

The main requirement of our simulation pipeline is 
that it must be fast, while providing realistic images. 
Since SExtractor has become the reference software 
for wide-field image analysis and is computationally ef- 
ficient, we use its execution time on a given image as 
our unit of time. In that sense, we want the running 
time of the cycle simulation creation (UFig) - simula- 
tion analysis (SExtractor) not to be dominated by the 
execution of UFig, hence setting the requirement that 
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UFig is not slower than SExtractor. For instance, UFig 
should simulate a typical Subaru SuprimeCam image of 
0.25 sq. deg. (made up of approximately 10,000x8,000 
pixels) with a 5-<x limiting magnitude of Rc = 26 under 
one minute. 

To meet this goal, we must define the most efficient 
way to draw galaxies and stars, their generation being 
the most expensive task of a simulation code. 

3.2. Pixel-based or photon-based? 

We can think of two ways to simulate 2D objects such 
as galaxies and stars in an image from a given light dis- 
tribution: (1) pixel-based and (2) photon-based. 

In the former case, an analytic description of the ob- 
ject is pixelized. The description can be a simple profile, 
which is simply pixelized by taking its value at the cen- 
ter of pixels, or by integrating it in pixels, or it can be 
more complex, as in a shapelets model 11341 12511 . Public 
simulation packages like Skymaker, SImage or Shera 
rely on this principle. 

In the second approach, an analytical description of 
the object is considered as the distribution of the pho- 
tons that make it up. Photons are then drawn individu- 
ally to make up the object. This approach is used e.g. 
by the LSST Simulation group QJJ] . 

The choice of the algorithm is determined by the total 
number of operations needed to create all the galaxies 
and all the stars of the simulation. These are described 
below. 

3.2.1. Number of operations for one elementary build- 
ing block 

Pixel-based approach:. In this case, the elementary 
building block of an object is one pixel. To simulate one 
pixel, we first have to draw its value from the analytical 
description of the object. The value of the PSF for that 
pixel is drawn in the same way, from an analytical de- 
scription of the PSF shape. An object should be made 
on a refined grid (i.e., using pixels smaller than those of 
the final simulations) in order to minimize approxima- 
tions of the profile at the center of pixels; analytically 
integrating over pixels allows one to get rid of those ap- 
proximations, but at the price of a more complex im- 
plementation. Then, Poisson noise must be applied to 
the pixel. Finally, the object, once created, is convolved 
with the PSF. This last step, albeit optimized by using 
FFTs, is computationally expensive, even more if the 
object is refined so that numerical errors are minimized. 

Photon-based approach:. In this case, the elementary 
building block of an object is one photon. To simulate 
one photon, we draw its position from the analytical 
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Figure 1 : Number of photons (thick black) and number of pixels (thin 
green) sampled per magnitude per square degree. Dash-dot lines rep- 
resent the contribution of galaxies, dashed lines show that of stars, and 
solid lines show the total. Diamonds and squares are photon counts 
from galaxies and stars from a typical Subaru-SuprimeCam image in 
Rc-band. 



description of the object, seen as a distribution func- 
tion. The effect of the PSF is simply to displace the 
photon; therefore, we just have to draw a random dis- 
placement from the analytical description of the PSF, 
and apply it to the position of the photon. Contrary to 
the pixel-based approach, no complex task (such as a 
convolution) has to be done at all. Finally, since photons 
are drawn individually, Poisson noise emerges naturally, 
with no need to add it eventually. 

Therefore, less operations are necessary to simulate a 
photon than to simulate a pixel. In the next subsection, 
we consider the number of photons and the number of 
pixels that we need to simulate to make an extragalactic 
Subaru SuprimeCam-like image, before concluding on 
what approach we choose. 

3.2.2. Number of photons vs number of pixels 

The number of photons coming from galaxies per 
square degree and per magnitude can be computed from 
the magnitude distribution of galaxies (Eq. IA.7I ) and the 
relation between the number of photons and the mag- 
nitude of a galaxy (Eq. IA.8t . A similar approach al- 
lows us to estimate the number of photons coming from 
stars. Integrating those functions, it is easy to estimate 
the number of photons required for a photon-by-photon 
simulation. Similarly, assuming that an average galaxy 
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is contained in a 21x21 postage stamp, and that we re- 
sample it by a factor of 5 when simulating i{H we can 
estimate the number of sampled pixels per square de- 
gree and per magnitude in a pixel-by-pixel approach. 
The same exercise can be done for stars, assuming that 
the average postage stamp size is 61x61 pixels (stars 
are on average brighter and more spread out than galax- 
ies). Fig. Q] shows the number of photons per square 
degree and per magnitude (black thick lines) and the 
number of (resampled by a factor 5) pixels required per 
square degree and per magnitude (green thin lines). In 
both cases, dash lines correspond to stars, dash-dot lines 
correspond to galaxies, and solid lines show the total 
number of photons and pixels. The symbols show pho- 
ton counts from a typical Subaru image: squares cor- 
respond to stars and diamonds to galaxies. Note that 
the star-galaxy separation, performed with SExtractor, 
confuses stars and galaxies at magnitudes less than 18, 
corresponding to saturated objects. Nevertheless, the 
total number of photons per square degree and magni- 
tude agrees extremely well with our expectations (black 
lines). 

The number of photons per square degree and per 
magnitude from galaxies remains fairly constant with 
magnitude, meaning that adding faint galaxies does not 
affect the execution time when simulating galaxies in 
a photon-based approach. On the other hand, photons 
from stars largely dominate the total number of photons 
at low magnitudes, and become highly subdominant for 
intermediate and high magnitudes. Thus, in a photon- 
based approach, the execution time will be dominated 
by stars generation. The same comparison between stars 
and galaxies can be done for the number of pixels: stars 
dominate at low magnitude, before becoming largely 
subdominant. Contrary to the number of photons, the 
number of pixels from galaxies per square degree and 
per magnitude increases linearly with magnitude, mean- 
ing that going deeper significantly affects the execution 
time. In the pixel-by-pixel approach, the execution time 
is dominated by galaxies. 

Table Q] gives the number of photons and pixels that 
one needs to sample to simulate a 0.25 deg 2 image with 
an 450 seconds exposure time, from magnitude 12 to 
magnitude 29. These numbers confirm the conclusions 
from Fig. Q~l the number of photons is highly dominated 
by stars' photons, and the number of pixels is domi- 
nated by galaxies. Furthermore, the number of photons 
needed to simulate galaxies is of the same order than the 



1 We find that a factor 5 refinement is a bare minimum, and as such, 
is a lower bound on what can be used. The bigger this factor, the more 
precise and expensive the simulation. 



Table 1 : Number of photons and pixels sampled for one UFig simula- 
tion. 





Number of photons 


Number of pixels 


Galaxies 


7xlO y 


5.6 x 10 y 


Stars 


5.1 x 10 10 


3.4 x 10 8 


Total 


5.8 x 10 10 


6x 10 9 



needed number of pixels. 

3.2.3. Conclusion 

The photon-based approach requires less operations 
per elementary building block (photons), than needed 
by the pixel-based approach. Furthermore, many more 
photons than pixels are needed to simulate stars. On 
the other hand, the numbers of photons or pixels to be 
sampled in either approach to simulate galaxies are sim- 
ilar. Note that according to Fig. [1] we would have to 
simulate less pixels than photons, were we to simulate 
galaxies up to magnitude Rc * 26 (corresponding to 
the Subaru telescope limiting magnitude); however, we 
need to take into account fainter galaxies, which affect 
the image's noise, and we choose Rc = 29 as our higher 
magnitude. 

For these reasons, we have decided to adopt a hybrid 
approach: we simulate galaxies with a photon-based ap- 
proach, and stars with a pixel-based approach. 

4. Implementation 

4.1. Galaxies 

We simulate a circular Sersic galaxy by sampling the 
radial position r of its N® photons (Eq. IA.8I links a 
galaxy's magnitude and its number of photons) from 
a y-distributed random variable Y (r = PVso/fe"), and 
their angular positions from a uniform distribution be- 
tween and 2n (see |Appendix A) . Then, we transform 
the galaxy so that it becomes elliptical with the desired 
ellipticity, as shown in the appendix. 

Finally, the galaxy is pixelized by truncating the coor- 
dinates of the photons' positions. Note that we truncate, 
and not round, the coordinates because a pixel can be 
seen as a bucket (e.g., x = 2.8 corresponds therefore to 
pixel number 2). 

4.2. PSF-induced photon displacement and stars 

The effect of the PSF on a incident photon is to dis- 
place it, the corresponding displacement following a 
p.d.f defined by the PSF profile. This can be seen by 
considering stars as point-like sources, which would ap- 
pear as a Dirac function in the absence of a PSF; their 
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observed shape (the PSF itself) is therefore the conse- 
quence of the displacement of the photons making up 
the oncoming Dirac function. The displacement dX due 
to the PSF is then obtained by uniformally sampling the 
Moffat profile's cumulative distribution function (c.d.f) 

dX = a ^[cdf(Y) - l] 1 ^ - 1 (see |Appendix B) . We 
use this technique to convolve galaxies with the PSF. 

As shown in Sect. [3] stars account for most of the 
photons to be simulated, hence we create them with 
a pixel-based approach. To this end, we integrate the 
Moffat profile numerically in pixels with a 7th order 
Legendre-Gauss quadrature rule B27I1 . and we perturb 
each pixel's value with a Poisson deviate. We integrate 
the profile from the center outwards, until the probabil- 
ity of detecting one photon in a pixel is less than one 
percent. While much faster, this method is statistically 
equivalent to drawing photons one by one. Furthermore, 
as opposed to a pixel-based approach to galaxy gener- 
ation, creating stars pixel by pixel does not involve ex- 
pensive numerical convolution with the PSF, nor is it 
impacted by potential numerical errors coming from the 
convolution. 

4.3. Optimizations 

4.3.1. Random number generation 

Operations such as drawing the position of galaxies 
and stars, drawing galaxies' photons, or generating the 
background noise, imply heavy use of random number 
generators. 



To enable the creation of the 
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dom numbers needed to simulate galaxiefl we im- 
plemented the lagged Fibonacci generator xi - 
fc-21034 + ^i-44497) mod 2 32 la, |7[], that we initialize 
with the mersenne twister generator implemented in the 
boost::mtl9937 random number generator ll26ll . The 
lagged Fibonacci generator is buffered, meaning that we 
generate a full lag of 44497 numbers at once instead 
of generating them only when needed. Finally, another 
advantage of the lagged Fibonacci generator is that it 
allows us to use vectorization (see below) to generate 
the lag. We tested our generator with the Test U01 test 
suite, which consists of 160 tests [22]: all tests were 
passed with a p-value inside the range [10" 4 ,1 - 10" 4 ]; 
tests with a p-value outside of this range would be con- 
sidered as failures. 



2 This number corresponds to four times the number of photons 
that must be generated (see Table since a photon requires four 
random numbers: position with respect to the galaxy's center (radial 
and angular) and displacement due to the PSF (radial and angular). 



4.3.2. Parallelization 

We start by generating a catalog of stars and galaxies, 
where astrometry, photometry and shape information is 
stored. This task is done with an openmp loop. 

Then, we implement multithreading in the following 
way. Galaxies are sorted by position (each thread thus 
dealing with a well defined region of the full image), 
in such a way that each thread gets the same number 
of photons; in this way, all threads run in the same 
time (a very bright galaxy does not impair the speed 
of a given thread). Stars are sorted by position, by 
making sure that each thread gets the same number of 
stars to generate. Since galaxies and stars are sorted 
by position, each thread works safely on its own part 
of the simulation, completely independently from the 
other threads. Therefore, we do not need critical section 
(openmp locks) to write to the global array making up 
the image. Locks are further avoided by forcing each 
thread to have its own set of random number generators 
(independent of other threads' random number genera- 
tors). 

The remaining tasks (noise generation, magnitude 
rescaling, image resampling) are parallelized using 
openmp. 

4.3.3. Other optimizations 

Approximation of functions. We use linear interpola- 
tions to common functions such as trigonometric func- 
tions or T function. This significantly speeds up our cal- 
culations. 

Vectorization. Streaming SIND Extensions (SSE) al- 
lows us to perform four floating point calculations at 
once: we use it for galaxy generation, noise generation 
and image resampling. We checked that using floating 
point values instead of double points value does not im- 
pact the precision. 

5. Quality assurance 

Fig. |2]compares a patch of a UFig simulation (lower 
panel) with a patch of a typical Subaru image (upper 
panel). Both patches are of the same size, and the dy- 
namic scale is the same in both panels. Visually, the 
shape and size of galaxies are well rendered by our sim- 
ulations. Moreover, the granularity and spatial corre- 
lation of the background is comparable to that of the 
real image. We simulate correlated noise by resampling 
the original simulation, whose background noise is un- 
corrected, normally distributed, with a Lanczos resam- 
pling. This resampling is fast (see Sect[6]), and allows us 
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Figure 2: Top: typical Subaru image. Bottom: UFig simulation. Both patches have an area of 400x300 pixels 2 , or 1.3x1 arcmin 2 . 
The dynamic scales are the same for both panels. 
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to mimic the resampling done in the processing of real 
data. 

In Fig. [3] we show the distribution of pixels in the 
real image (black solid line) and in the simulation (red 
dashed line). The distributions agree well, especially for 
values near zero, corresponding to background pixels, 
and for high-value pixels, corresponding to bright ob- 
jects. Around zero, the distributions are well rendered 
by Gaussian distributions, hence confirming that gener- 
ating the background noise from a Gaussian deviate is 
correct. The flat distribution of negative values for the 
real image are created when processing raw data single 
exposures into a resampled, coadded image. Although 
they are not reproduced with the original Gaussian noise 
model used in UFig (not shown on the figure), we have 
checked that they appear when we simulate raw data 
single exposures that we process to obtain a final coad- 
ded image. This confirms that they are indeed due to the 
data processing. Resampling the (uncorrelated Gaus- 
sian noise-) simulated image with a Lanczos resampling 
allows us to better mimic the data reduction process and 
to better reproduce the background noise, while avoid- 
ing an expensive data reduction of simulated raw im- 
ages. 

Fig. [4] shows the distribution of stars and galaxies 
in the magnitude-size plane, both for a typical Subaru 
image (upper panel) and our simulation (lower panel). 
Magnitudes are given by SExtractor, and we define the 
half-light radius ee50 as SExtractor' sFLUX_RADIUS 
(note that all SExtractor's parameters are the same 
when running SExtractor on the real image and on 
the simulation, preventing any difference from the SEx- 
tractor analysis). We also rely on SExtractor to per- 
form the galaxy-star separation; to this end, we set SEx- 
tractor's SEEING_FWHM equal to the seeing input 
in the simulation (0.6" in the case shown by Fig. |4j, 
and we further set SExtractor's CLASS_STAR=0.9. 
Galaxies are shown by the black symbols, while stars 
are shown by the green symbols. The star branch is nar- 
rower in the simulation than in the real image because 
we forced the PSF to be constant in the simulation. 

Despite the simple models used, UFig produces real- 
istic images, that are consistent with real images. 

6. Computational performance 

6.1. Execution time 

We tested the performance of UFig on a laptop (mac- 
Book Pro with a 2.7 GHz Intel processor, 4 cores and 
16 GB RAM). Using eight threads, UFig provides a 
10kx8k-pixels-image (approximately the size of a typ- 
ical Subaru image) in 30 seconds. This is smaller than 
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Figure 3: Pixels distribution. The solid black line is for a typi- 
cal Subaru image, and the dashed red line is for a UFig simu- 
lation. 

the execution time of SExtractor on a similar image. 
For comparison, it took several hours to create a similar 
image using Skymaker and S image. 

We further measured that UFig uses 66% of the lap- 
top's peak performance (29 Gflops out of 43.2 peak 
Gflops). 

In the remainder of this section, we discuss how the 
UFig execution time depends on some input parameters, 
and how the time spending is distributed between differ- 
ent tasks. Fig. [5]shows how the execution time to create 
an image with exposure time f exp = 450s depends on 
the image's size, when using eight threads. The black 
thick solid line shows the total execution time. The red 
thick short dashed line shows the time spent sampling 
galaxies, while the green thick dash-dot line shows that 
spent simulating stars. The dash-dot-dot-dot line corre- 
sponds to the time needed to generate noise and the long 
dashed line to the time spent resampling the image. Di- 
amonds represent the time spent writing the image and 
the corresponding catalog to the disk. Finally, the dot- 
ted line shows the overheads (defined as all tasks not 
directed directly at creating or writing the simulation). 
For big enough images (more than 10 7 pixels), most of 
the time is spent drawing galaxies, then drawing stars. 
For smaller images, the execution time is dominated by 
overheads. Excepted the overheads, that are expected 
not to depend strongly on the image size, all tasks' time- 
spending show a clearly linear dependence on image 



7 



Real image 




22 
mag 



100.0 



10.0 



1.0 



0.1 



_'l I I 

Overall time 


'1 1 ' ' 1 i 


Galaxy sampling 


/,' 






N ois ^ ° ene rat i o n 


ft 


oi/o 




Resampling 


/ t 


Overheads 


ft 








/ ' / 
















/ * 4 




/ * A 








/ ' ft ■'" - 




ft <&■•■'- 




ft r /_ 








t ft / / - 




! Jr" '.S' - 
i 'ft / 


- i 
i 


f / h 


i 


/ / / - 


■ v 


/ / / : 


• ' '■• / 


/ 




/ 


/.' // 


/ 


1 -.vl /' ,// , ,, 


.1 /.i 



U Fig simulation 



10 4 10 5 10 6 10 7 10 8 
§ of pixels 



Figure 5: Execution time vs image size, using eight threads. 
Black solid line: overall time. Red short dashed line: galaxy 
sampling. Green dash-dot line: stars sampling. Dash-dot-dot- 
dot line: noise generation. Diamonds: image and catalog writ- 
ing to disk. Long dashed line: image resampling. Dotted line: 
overheads 
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Figure 4: Magnitude-size distribution for a typical Subaru im- 
age (upper panel) and for a UFig simulation (lower panel). 
Black points represent galaxies, and green points represent 
stars, galaxies and stars being separated by SExtractor. 



size. This behavior is expected, since the number of 
photons (for galaxy generation) and pixels belonging to 
stars (for star generation) increase linearly with the im- 
age size. Similarly, all other tasks, by definition, depend 
linearly on the number of pixels. 

Fig. [6] shows the relation between the exposure time 
used in the simulations, and the execution time, when 
using eight threads. Lines have the same meaning as 
in Fig. [5] The time spent resampling the image is not 
shown, since it is less than one second. For small ex- 
posure times (less than 200 seconds), most of the time 
is spent writing to disk, whereas generating galaxies 
is most expensive for large exposure times. Generat- 
ing galaxies and stars both scale linearly with exposure 
time. This is expected; the number of photons that one 
has to simulate to generate galaxies obviously depends 
linearly on the exposure time. So does the flux, and 
therefore the number of pixels that one has to draw when 
creating stars. All other tasks do not depend on the ex- 
posure time. 

6.2. Parallelization 

Fig. Q shows how the execution time depends on the 
number of threads used for the computation, when sim- 
ulating a 10,000x8,000 pixels image with an exposure 
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Figure 6: Execution time vs exposure time, using eight threads. 
Line styles and colors are the same as in Fig. [5] 



Figure 7: Execution time vs number of threads. Diamond-solid 
line: overall execution time. Thin solid line: overheads. 



time f exp = 450 s on the same laptop as that used above. 
The lower curve corresponds to the overheads, as de- 
fined above. When using two threads, we need half as 
much time to create a simulation as when using only one 
thread. A significant gain in execution time can be seen 
when using up to five threads, before the execution time 
plateaus. This plateau is due to the fact that Intel hyper- 
threading is at work, meaning that when more than four 
cores are used, threads starts to compete against each 
other (our laptop having four physical cores - corre- 
sponding to eight virtual cores). We therefore expect 
an optimal parallelization to provide us with a factor of 
four improvement in execution time: this is indeed what 
we measure, meaning that our parallelization is nearly 
optimal. 

6.3. Memory management 

Independently of the number of threads used, a run 
of the code uses the amount of RAM memory needed to 
store one copy of the simulated image. For instance, for 
a 10,000x8,000 pixels simulation, 300 MB of memory 
are required. 



7. Conclusion and perspective 

By introducing the Ultra Fast Image Generator 
(UFig), we showed that it is currently possible to imple- 
ment very fast codes to simulate wide-field astronomical 
images. We showed that, using simple models, we can 



simulate realistic images which take observation con- 
straints, including the PSF and various sources of noise, 
into account. 

Combining analytic simple models with state-of-the- 
art computing optimizations allows us to produce the 
mock of a typical Subaru SuprimeCam image (0.25 
sq. deg, 10kx8k pixels) with a 5-cr (extended) limit- 
ing magnitude of Rc = 26 , in which we account for 
galaxies of magnitude up to Rc = 29, in 30 seconds 
when using a laptop (macBook Pro with a 2.7 GHz Intel 
processor, 4 cores and 16 GB RAM); thus, an average 
galaxy is simulated in 30 /is. This represents an im- 
provement of several orders of magnitude in execution 
time compared to the public softwares that we are aware 
of. It is also comparable with the execution time of SEx- 
tractor on a similar image; given the optimization of 
its implementation, as well as its extensive and inten- 
sive use, SExtractor can be taken as a standard, well 
optimized, code in astronomy, thus setting a timescale 
for any new software that is used in combination with 
it. UFig is thus complementary to end-to-end simula- 
tion codes which aim to model observational effects in 
great details, but with greater execution time. 

We have presented the implementation of the code, 
with an emphasis on the different optimizations that we 
use. In particular, we have found that the optimal solu- 
tion is to adopt a hybrid approach to generate galaxies 
and stars, where we create galaxies with a photon-based 
approach, and stars with a pixel-based approach. We 



9 



have also described how we optimize random number 
generation by implementing a lagged Fibonacci random 
number generator, how we parallelize the code using 
multithreading in which threads are completely inde- 
pendent, and how we approximate common functions 
and use SSE vectorization to speed up calculations. 

We have then shown that UFig's simulations are con- 
sistent with real images, using simple standard tests. 
Finally, we have investigated the performances of the 
code, where we have checked that the execution time 
scales as it should were the code perfectly optimized. In 
particular, we showed that the parallelization of the code 
is nearly optimal. Therefore, the UFig implementation 
reaches the limits of the current computation possibil- 
ities. This is highlighted by the usage of 66% of our 
laptop's peak performance by UFig. 

The current UFig implementation relies on simple 
models; although it produces images sufficiently real- 
istic for many applications, such as testing data pro- 
cessing codes, or calibrating photometry or astrometry 
codes, these simple models may need to be refined to 
use UFig for very high-precision analyses. Therefore, 
we plan to increase the UFig realism by including more 
cosmology in the code. For instance, we will increase 
the shape complexity of galaxies, allow them to be spa- 
tially clustered, as well as distributed in redshift, and we 
will add weak lensing (either from large-scale structures 
or from massive clusters). Using more complex models 
will have an impact on the code's performance, which 
we will assess and take into account to keep UFig opti- 
mal. 

Another application of UFig's speed is to calibrate 
a computer intensive measurement method. For in- 
stance, ll35tl and i20Tl showed that a promising approach 
to cosmic shear measurement pipelines is to calibrate 
them with image simulations (with observation condi- 
tions similar to those of the real data to analyze) to al- 
leviate systematic effects. Up to now, such a calibra- 
tion was time-consuming and thus limited. Hence, UFig 
opens a new window to improve on computational in- 
tensive measurement techniques, such as those used in 
weak lensing, or in transient searches, for which UFig's 
ability to efficiently simulate time series observations 
may prove to be central. 

Further details about UFig can be found at 



http://www.astro.ethz.ch/refregier/research/Software/ufig 



Appendix A. Galaxy model 

Appendix A.l. Sersic profile 

We describe galaxies by a Sersic profile (Eq. [TJ. To 
assess the distribution of the Sersic index for galaxies 



up to high redshift, we use the Advanced Camera for 
Survey General Catalog (ACS-GC - HH). Figure lA~8l 
shows the distribution that we extract from the catalog, 
as measured in the I-band, for galaxies with good pho- 
tometric redshifts and magnitude between 15 and 26. 
Colors code for different ranges of magnitudes: black 
for magnitude less than 20, red for magnitude between 
20 and 22, green for magnitude between 22 and 24, 
and blue for magnitude between 24 and 26. For bright 
galaxies, the distribution is clearly bimodal, as is well 
known (see e.g. IU4I1 ). This bimodality disappears for 
fainter galaxies. Whether this highlights physical pro- 
cesses in galaxy formation and evolution, or whether it 
is simply due to a selection effect, is beyond the scope 
of this paper: we are only interested in modeling the dis- 
tribution of Sersic indices as close to the observed one 
as possible. We find that the p.d.f of Sersic indices for 
bright galaxies (magnitude less than 20) is well repre- 
sented by: 

f(n) = exp(7V(0.3,0.5) + Af(1.6, 0.4)) + 0.2. (A.l) 

For faint galaxies (magnitude bigger than 20), we find 
that the p.d.f of Sersic indices if well described by: 



fin) = exp(Af(0.2, 1)) + 0.2. 



(A.2) 



This analytical description is shown for faint galaxies as 
a dashed line in Fig. IA.8I 

Appendix A.l.l. The circular Sersic profile seen as a 
y-distribution 
The Sersic profile is given by Eq. (Q}, and can be 
normalized to unity flux as: 



1,2/1 



/(#•) = 



27rnr^ F(2«) 



exp 



A- 



l/n 



(A.3) 



This profile can be seen as the p.d.f of the radial po- 
sition of photons from a circular galaxy. Then let X be 
the random variable describing the position of a photon. 
The probability to find the photon in the shell [X, X+dX] 
from the center of the galaxy is given by 

fx(X)dX = 2nXI(X)dX. (A.4) 

/ y \ \jn 

Defining the random variable Y — k ( f- j , it can be 
shown that it follows a y-distribution of shape parameter 
In: 



MY) 



T(2n) 



(A.5) 
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Figure A. 8: Distribution of Sersic indices from the ACS-GC catalog 
flal . for different magnitude ranges. Histograms are scaled such that 
their maximum is 1 . The dashed line shows the analytical description 
input in UFig for galaxies with magnitude bigger than 20. 



Figure A. 9: Cumulative distribution of galaxies' magnitude in the R- 
band, from several surveys. The fitting function (dashed line) is de- 
fined by Eq. (AJ\ and TablelA2l 



Appendix A.1.2. From a circular to an elliptical galaxy 
We define a galaxy's ellipticity (£1,62) through its 
quadrupoles7 (J asei+i£2 = (hi- Jn+'ZiJn) I (J u+ J 22) 
A circular galaxy can be made elliptical through the 
transformation (xg,yg) = A(xc,yc), where (xs,ys) and 
(xc, yc) are the photon's coordinates in the elliptical and 
circular galaxy respectively, and A is the transformation 
matrix (for ||e|| + 0): 



1 



Table A. 2: Fitting coefficients for galaxy cumulative counts. 



A = — x 
V2 



log 10 (JV < mag) = Yj «/( ma S - 23 ) ! '< ( A - 7 ) 



Filter 


a 


a\ 


«2 


Rc 


4.300 


0.383 


-0.00766 


I 


4.579 


0.360 


-0.0229 


Z 


4.558 


0.410 


-0.0248 



'sign(e 2 ) VTTlRi tJiT~{± 1 -VT^H^/i-^ 
, Vi+TRI si g n (e2 ) VTHM A /T^^ 

(A.6) 

If ||e|| - 0, A is the identity matrix. 



Appendix A.2. Magnitude distribution 

We parametrize the magnitude distribution of galax- 
ies from galaxy counts in different surveys, such as the 
VIRMOS Descartes JH, COSMOS M, SXDS (3, 
and the Hershell Telescope and Hubble Deep fields [30]. 
We compile the cumulative counts of these surveys, and 
fit the resultant overall counts with a polynomial of the 
form 



as shown by Fig. IA.9l for counts in the R-band. Table 
IA.2l summarizes the coefficients a, for different filters. 

The number of photons making up a galaxy of mag- 
nitude mag, for a single exposure time f e xp, in the AB- 
magnitude system, is given by: 



1 0- 26 f exp AS — T 1 o - 4 ^ 9 ^™^-™^) , 
hA 



(A.8) 



where AS is the effective telescope's mirror's surface, A 
is the filter's central wavelength, AA the filter's width, ft 
is the atmospheric extinction, h is the Planck constant, 
and T is the total throughput of the observation system 

— ^mirror ^camera ^filter ^corrector- 

This number can be rescaled to simulate a galaxy on 
a coadded image with magnitude zero-point mag : 



N' = N® 1 o°- 4(mago_mag o )_2 - 5 log( '" 



(A.9) 



where mag^ 7 ' is the instrument magnitude zero-point, 
i.e. the magnitude corresponding to a flux of 1 ADU 
for a one second exposure time in the same observing 
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conditions, 

magi 7 ' = 8.9 + ySairmass - 2.5 log — — , 

e £ io- 26 Asffr 

(A. 10) 

where gain is the CCD's gain and Qe its quantum effi- 
ciency. 

Appendix A.3. Magnitude-size relation 

We use the ACS-GC catalog to parametrize the rela- 
tion between the apparent magnitude and apparent size 
of galaxies. In this catalog, galaxies' are estimated 
by fitting a Sersic profile. Since this catalog is shallower 
than the public data we used to estimate the magnitude 



distribution ( Appendix A. 2 1, we do not use it to param- 
eterize the magnitude distribution, but use that derived 
above. The ACS-GC data are therefore used only to an- 
alytically describe the relation between the magnitude 
and the size of galaxies. We find that in the mag r - r^ r 
plane, where mag,, and r^j are the magnitude and the 
size rotated such that they become uncorrelated, and 
shifted such that they have zero mean, the distribution 
of the log of the size at a given magnitude mag,, is 
well fitted by a Gaussian. Additionally, given that the 
correlation angle between the size and the magnitude is 
small, we checked that the magnitude distribution de- 



rived in Appendix A. 2 for the unrotated magnitudes is 
still a good fit to that of the rotated magnitudes mag r 
Thus, we set a galaxy's magnitude and size such that: 



mag 



cos 8 sin 9 
- sin 6 cos 6 



mag r / mag p 
rso, r ) \ r 50 , p 

(A. 11) 

where mag p and r^ p set the pivot point around which 
magnitudes and sizes are rotated. They are estimated 
from the magnitude and size means of COSMOS, and 
are set to mag ;:l = 25.309 and log r^ p = -0.796 arcsec. 
The correlation angle is set to 5.7 deg. The rotated 
magnitude mag,, is drawn from the distribution (1A.7I >. 
and the log of the rotated size r^ r is drawn from a nor- 
mal distribution of zero mean and standard deviation 
0.19 arcsec. 

We checked that the magnitude-size relation does not 
depend significantly on the observing band, and there- 
fore, we restrict its parametrization to what is presented 
here. 



Appendix B. Stars and Point Spread Function mod- 
els 

We use a Moffat profile to account for the PSF. The 
(circular) Moffat profile is defined in Eq. (0, where Iq 



is the value at the origin (r = 0), and a and f3 are scale 
parameters depending on the observation's conditions. 
For instance, for realistic atmospheric turbulences, /? = 
4.765 13^1 . We use ft — 2.6 to account for instrumental 
effects (especially diffraction). The profile's width a is 

and to its r^o by 



related to its FWHM by a 

a = r5 ° 

Contrary to the case of the Sersic profile, the Moffat 
profile cannot be linked to a usual known distribution, 
from the p.d.f of which we can easily estimate the dis- 
placement to apply to a given photon in UFig. There- 
fore, we sample the displacement by inverting the c.d.f 
of the profile. 

The Moffat profile, seen as the normalized p.d.f 
fx(X), where the random variable X corresponds to the 
photon's displacement, is given by: 



fx(X)dX = 



2(fi-l)X 



1 + 



(!)' 



dX. 



(B.l) 



Defining the variable Y = 1 + (X/a) 2 , so that the p.d.f 
of Y is f Y (Y) - (J3- \)Y~P for Y > 1, and integrating it, 
we obtain the c.d.f of Y: 



cdf(Y) = 1 - F 1 ^. 



(B.2) 



The displacement due to the PSF is then obtained by 
inverting Eq. dB.2b . the c.d.f. itself being uniformally 
sampled: 



X 



cdf( Y)- rp - 1. 



(B.3) 



We parametrize the stars' magnitude distribution with 
a polynomial fit to the Milky Way model derived in [ 36] . 
Given the position in the sky of the image we want to 
simulate, we extract the stars' magnitude distribution 
from the corresponding online application ll3~7ll . 



Appendix C. Noise model 

Poisson noise is automatically and naturally ac- 
counted for when simulating galaxies in a photon-based 
approach. We add Poisson noise for stars, and ac- 
count for the noise from various sources, like the sky 
brightness, the readout noise and data processing, with 
a Gaussian random deviate with zero mean and standard 
deviation (see e.g. lfl7ll or I29IO : 



0~N 



\ l R0N \ 2 ■ Fsk y i 'f (r 

\ «exp — + + /dp, (C. 

\ \ gain / n exp gain 



1) 
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where RON is the readout noise of the camera, F s ^ y is 
the sky brightness in ADUs, « exp is the number of expo- 
sures out of which the coadded image is assumed to be 
done and /d p describes the noise coming from the data 
reduction (including, but not limited to, fiat-fielding in- 
accuracies). 

It should be noted that independently of the hybrid 
approach we use to simulate galaxies and stars, we treat 
the background noise at the pixel level, and therefore 
add it in ADUs to the noiseless image. Finally, we re- 
sample our simulations with a Lanczos filter to better 
mimic the data reduction process. 
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